872 research outputs found
Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling
Conditional Random Fields (CRFs) constitute a popular and efficient approach
for supervised sequence labelling. CRFs can cope with large description spaces
and can integrate some form of structural dependency between labels. In this
contribution, we address the issue of efficient feature selection for CRFs
based on imposing sparsity through an L1 penalty. We first show how sparsity of
the parameter set can be exploited to significantly speed up training and
labelling. We then introduce coordinate descent parameter update schemes for
CRFs with L1 regularization. We finally provide some empirical comparisons of
the proposed approach with state-of-the-art CRF training strategies. In
particular, it is shown that the proposed approach is able to take profit of
the sparsity to speed up processing and hence potentially handle larger
dimensional models
An Operational SSL HF System (MILCOM 2007)
8 pagesInternational audienceAbstract- This paper presents an operational HF (3-30MHz) system designed for single site localization (SSL) of transmitters involved in trans horizon radio links. It associates the estimation of the directions of arrival of incident radio waves refracted by the ionosphere with a ray tracing software based on the PRIME model of the channel. The direction finding processing is implemented on an array of non identical sensors that presents a polarization sensitivity. A specific version of the MUSIC algorithm jointly estimates the angles of arrival (azimuth and elevation) of incident waves and their polarization. Statistics of the angles of arrival (mean values and standard deviation) are the input data of a ray tracing software based on the PRIME model of the ionosphere which computes the estimated position of the transmitter. Numerous radio links have been tested for long distances up to 2000 km. A very good agreement is observed between the exact and the estimated positions of the transmitters with a standard localization error being less than 10% of the distance to the receiving system
Du quatriĂšme de proportion comme principe inductif : une proposition et son application Ă lâapprentissage de la morphologie
Nous prĂ©sentons un modĂšle dâapprentissage par analogie qui exploite la notion de proportions analogiques formelles ; cette approche prĂ©suppose de savoir donner un sens Ă ces proportions et de pouvoir implanter efficacement leur calcul. Nous proposons une dĂ©finition algĂ©brique de cette notion, valable pour les structures utilisĂ©es couramment pour les reprĂ©- sentations linguistiques : mots sur un alphabet fini, structures attribut-valeur, arbres Ă©tiquetĂ©s. Nous prĂ©sentons ensuite une application Ă une tĂąche concrĂšte, consistant Ă apprendre Ă ana- lyser morphologiquement des formes orthographiques inconnues. Des rĂ©sultats expĂ©rimentaux sur plusieurs lexiques permettent dâapprĂ©cier la validitĂ© de notre dĂ©marche
Measuring text readability with machine comprehension: a pilot study
International audienceThis article studies the relationship between text readability indice and automatic machine understanding systems. Our hypothesis is that the simpler a text is, the better it should be understood by a machine. We thus expect to a strong correlation between readability levels on the one hand, and performance of automatic reading systems on the other hand. We test this hypothesis with several understanding systems based on language models of varying strengths, measuring this correlation on two corpora of journalistic texts. Our results suggest that this correlation is rather small that existing comprehension systems are far to reproduce the gradual improvement of their performance on texts of decreasing complexity
Learning the Structure of Variable-Order CRFs: a finite-state perspective
The computational complexity of linear-chain Conditional Random Fields (CRFs) makes it difficult to deal with very large label sets and long range dependencies. Such situations are not rare and arise when dealing with morphologically rich languages or joint labelling tasks. We extend here recent proposals to consider variable order CRFs. Using an effective finite-state representation of variable-length dependencies, we propose new ways to perform feature selection at large scale and report experimental results where we outperform strong baselines on a tagging task
Evaluating Subtitle Segmentation for End-to-end Generation Systems
Subtitles appear on screen as short pieces of text, segmented based on formal constraints (length) and syntactic/semantic criteria. Subtitle segmentation can be evaluated with sequence segmentation metrics against a human reference. However, standard
segmentation metrics cannot be applied when systems generate outputs different than the reference, e.g. with end-to-end
subtitling systems. In this paper, we study ways to conduct reference-based evaluations of segmentation accuracy irrespective
of the textual content. We first conduct a systematic analysis of existing metrics for evaluating subtitle segmentation. We then
introduce Sigma, a new Subtitle Segmentation Score derived from an approximate upper-bound of BLEU on segmentation
boundaries, which allows us to disentangle the effect of good segmentation from text quality. To compare Sigma with existing
metrics, we further propose a boundary projection method from imperfect hypotheses to the true reference. Results show that
all metrics are able to reward high quality output but for similar outputs system ranking depends on each metricâs sensitivity
to error type. Our thorough analyses suggest Sigma is a promising segmentation candidate but its reliability over other
segmentation metrics remains to be validated through correlations with human judgements
BiSync: A Bilingual Editor for Synchronized Monolingual Texts
In our globalized world, a growing number of situations arise where people
are required to communicate in one or several foreign languages. In the case of
written communication, users with a good command of a foreign language may find
assistance from computer-aided translation (CAT) technologies. These
technologies often allow users to access external resources, such as
dictionaries, terminologies or bilingual concordancers, thereby interrupting
and considerably hindering the writing process. In addition, CAT systems assume
that the source sentence is fixed and also restrict the possible changes on the
target side. In order to make the writing process smoother, we present BiSync,
a bilingual writing assistant that allows users to freely compose text in two
languages, while maintaining the two monolingual texts synchronized. We also
include additional functionalities, such as the display of alternative prefix
translations and paraphrases, which are intended to facilitate the authoring of
texts. We detail the model architecture used for synchronization and evaluate
the resulting tool, showing that high accuracy can be attained with limited
computational resources. The interface and models are publicly available at
https://github.com/jmcrego/BiSync and a demonstration video can be watched on
YouTube at https://youtu.be/_l-ugDHfNgU .Comment: ACL 2023 System Dem
Cross-lingual alignment transfer: a chicken-and-egg story?
International audienceIn this paper, we challenge a basic assumption of many cross-lingual transfer techniques: the availability of word aligned parallel corpora, and consider ways to accommodate situations in which such resources do not exist. We show experimentally that, here again, weakly supervised cross-lingual learning techniques can prove useful, once adapted to transfer knowledge across pairs of languages
Reassessing the proper place of man and machine in translation: a pre-translation scenario
Traditionally, human--machine interaction to reach an improved machine translation (MT) output takes place ex-post and consists of correcting this output. In this work, we investigate other modes of intervention in the MT process. We propose a Pre-Edition protocol that involves: (a) the detection of MT translation difficulties; (b) the resolution of those difficulties by a human translator, who provides their translations (pre-translation); and (c) the integration of the obtained information prior to the automatic translation. This approach can meet individual interaction preferences of certain translators and can be particularly useful for production environments, where more control over output quality is needed. Early resolution of translation difficulties can prevent downstream errors, thus improving the final translation quality ``for free''. We show that translation difficulty can be reliably predicted for English for various source units. We demonstrate that the pre-translation information can be successfully exploited by an MT system and that the indirect effects are genuine, accounting for around 16% of the total improvement. We also provide a study of the human effort involved in the resolution process
- âŠ